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by 
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Abstract 

The  purpose  of  this  paper  is  to  investigate  the  convergence  rates  of  a  sequence  of 
empirical  Bayes  decision  rules  for  the  two-action  decision  problems  where  the  distributions 
of  the  observations  belong  to  a  discrete  exponential  family.  It  is  found  that  the  sequence 
of  the  empirical  Bayes  decision  rules  under  study  is  asymptotically  optimal,  and  the  order 
of  associated  convergence  rates  is  0(exp(— cn)),  for  some  positive  constant  c,  where  n  is 
the  number  of  accumulated  past  experience  (observations)  at  hand.  Two  examples  are 
provided  to  illustrate  the  performance  of  the  proposed  empirical  Bayes  decision  rules.  A 
comparison  is  also  made  between  the  proposed  empirical  Bayes  rules  and  some  earlier 
existing  empirical  Bayes  rules. 
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1.  Introduction 


The  empirical  Bayes  approach  in  statistical  decision  theory  is  appropriate  when  one 
is  confronted  repeatedly  and  independently  with  the  same  decision  problem.  In  such 
instances,  it  is  reasonable  to  formulate  the  component  problem  in  the  sequence  as  a  Bayes 
decision  problem  with  respect  to  an  unknown  prior  distribution  on  the  parameter  space  and 
then  use  the  accumulated  observations  to  improve  the  decision  rule  at  each  stage.  This 
approach  is  due  to  Robbins  (1956,  1964,  1983).  Many  such  empirical  Bayes  rules  have 
been  shown  to  be  asymptotically  optimal  in  the  sense  that  the  risk  for  the  nth  decision 
problem  converges  to  the  optimal  Bayes  risk  which  would  have  been  obtained  if  the  prior 
distribution  was  fully  known  and  the  Bayes  rule  with  respect  to  this  prior  distribution  was 
used. 

The  usefulness  of  empirical  Bayes  rules  in  practical  applications  clearly  depends  on 
the  convergence  rates  with  which  the  risks  for  the  successive  decision  problems  approach 
the  optimal  Bayes  risk.  The  purpose  of  this  paper  is  to  investigate  the  convergence  rates  of 
a  sequence  of  empirical  Bayes  rules  for  two-action  decision  problems  when  the  distributions 
of  the  observations  belong  to  a  discrete  exponential  family. 

Let  X  be  a  random  observation  with  probability  function  of  the  form 

(1.1)  f{x\6)  =h[x)ex0{6),x  =  0,1,2,...  ;0<*<Q, 

where  h(x)  >  0  for  all  x  =  0, 1, 2, . . .,  and  where  Q  may  be  finite  or  infinite.  The  observation 
X  may  be  thought  of  as  the  value  of  a  sufficient  statistic  based  on  several  iid  observations. 
Consider  the  following  testing:  Ho :  0  >  $o  against  H\  :  9  <  0o,  where  0O  is  a  known 
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positive  constant.  For  each  »  —  0,1,  let  i  denote  the  action  deciding  in  favor  of  H{.  For 
the  parameter  9  and  action  t,  the  loss  function  is  defined  as: 


(1.2)  L(M)  =  (1  -  0(«o  -  #)/< p, *.)(*)  +  •*(#  -  0o)I{eOtQ) (9), 

where  IA(-)  denotes  the  indicator  function  of  the  set  A.  In  (1.2),  the  first  item  is  the  loss 
due  to  taking  action  0  when  9  <  *o,  and  the  second  item  is  the  loss  of  taking  action  1 
when  9  >$o.  It  is  assumed  that  9  is  the  value  of  a  random  variable  6  having  an  unknown 
prior  distribution  G(0). 

For  a  decision  rule  d,  let  d(x)  -  P{accepting  H0\X  =  x).  That  is,  d(x)  is  the 
probability  of  taking  action  0  given  X  =  x.  Let  D  be  the  class  of  all  decision  rules.  For 
each  decision  rule  d,  let  r(G,d)  denote  the  associated  Bayes  risk.  Then,  r(G)  =  mf  r(G,d) 
is  the  minimum  Bayes  risk  among  the  class  D. 

Based  on  the  statistical  model  described  above,  the  Bayes  risk  associated  with  the 
decision  rule  d  is: 


(1.3) 

where 

(»•<) 

(1.5) 


oo 


r(G,d)  =  -  <p{x)]d(x)f{z)  +  C, 


*=o 


P(l)  -  *(*  +  l)/W 


/(*)-  (Q  l(x\t)da(»), 
Jo 


(1.6) 


C  =  E  A*  -  «o)/(*l*)JGW- 

x=0  J 


We  consider  only  priors  G  suck  that  Jq  9dG(9)  <  oo  to  insure  that  the  risk  is  always 


finite. 


Note  that  C  is  a  constant  which  is  independent  of  the  decision  rule  d.  Thus,  from 
(1.3),  a  Bayes  decision  rule,  say  da,  is  clearly  given  by 

(1.7)  ia(x)  =  f  * 

'  '  lO  otherwise. 


Since  the  prior  distribution  G  is  unknown,  it  is  not  possible  to  apply  the  Bayes  rule 
for  the  decision  problem  at  hand.  In  this  situation,  we  use  the  empirical  Bayes  approach. 
We  note  that  Johns  and  Van  Ryzin  (1971)  have  studied  the  above  decision  problem  via 
empirical  Bayes  approach.  In  this  paper,  a  sequence  of  empirical  Bayes  decision  rules  {d* } 
is  proposed  for  the  above  described  decision  problem.  The  associated  asymptotic  optimal¬ 
ity  property  is  investigated.  It  is  found  that  the  order  of  the  rate  of  convergence  of  {d*  } 
is  0(exp(— cn))  for  some  positive  constant  c,  where  n  is  the  number  of  accumulated  past 
experience  (observations)  at  hand.  Two  examples  are  given  to  illustrate  the  performance 
of  the  proposed  empirical  Bayes  decision  rules.  A  comparison  is  also  made  between  the 
proposed  empirical  Bayes  rules  and  some  earlier  existing  empirical  Bayes  rules. 


2.  The  Proposed  Empirical  Bayes  Rules  and  Its  Asymptotic  Optimality 

For  each  j  =  1, . . .,  let  (Xj,0j)  be  a  pair  of  random  variables,  where  X}  is  observable 
but  Oy  is  not  observable.  Conditional  on  0;  =  9,Xj  has  probability  function  f(x\6).  It 
is  assumed  that  ©;,  j  =  1, . . .,  are  independently  distributed  with  common  unknown  prior 
distribution  G.  Therefore,  (Xj, Qj),j  =  1,2,...,  are  iid.  Let  Xn  =  {X\ , . .  .,Xn)  denote 
the  n  past  observations  and  let  X„+i  =  X  denote  the  current  random  observation. 


According  to  (1.4)  and  (1.7),  an  empirical  Bayes  decision  rule,  say  d* ,  is  proposed  as 


follows.  First,  for  each  x  =  0, 1,2, . . .,  let 

(2.1)  £/{.)(*,)  +  «... 

y=i 

where  6n  is  a  positive  value  such  that  Sn  =  o(  1).  Then,  let 


(2.2) 


<Pn(x)  = 


h(x)/n(x+  1) 
h{%  +  l)/n(x) 


We  then  define 


(2.3)  <p*n(x)  =  *>«(!/)]  A  Q, 

where  a  A  b  =  min{a,6).  Finally,  the  empirical  Bayes  decision  rule  d*  is  defined  as: 


(2.4) 


if  *?;(*)  >  *o, 

otherwise. 


Note  that  the  past  data  Xn  is  implicitly  contained  in  the  subscript  n. 

Definition  2.1.  A  decision  rule  d  is  said  to  be  monotone  if  for  x,  y  >  0  with  x  <  y, 
d(x)  <  d(y). 

Note  that  from  (2.3),  (x)  is  nondecreasing  in  x.  Then,  by  (2.4),  we  see  that  d*  (x) 

is  a  monotone  decision  rule. 

In  the  following,  the  asymptotic  optimality  of  the  sequence  of  the  proposed  empirical 
Bayes  decision  rules  {d* }  will  be  investigated.  The  monotonicity  of  the  decision  rules  {d* } 
will  be  used  to  obtain  the  related  asymptotic  optimality. 
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Consider  an  empirical  Bayes  decision  rule  d„(x).  Let  r(G,  dn)  be  the  Bayes  risk 


associated  with  the  rule  dn.  Then, 

(2.5)  r(G,d„)  =  -  *>(s)|£I4.(x)]/(z)  +  C, 

*=0 

where  the  expectation  E  is  taken  with  respect  to  Xn.  Since  r(G)  is  the  minimum  Bayes 
risk,  r(G,dn)  —  r(G)  >  0  for  all  n.  Thus,  the  nonnegative  difference  r(G,dn)  —  r(G)  is  used 
as  a  measure  of  the  optimality  of  the  empirical  Bayes  decision  rule  dn. 


Definition  2.2.  A  sequence  of  empirical  Bayes  decision  rules  {dn}^L;  is  said  to  be 
asymptotically  optimal  at  least  of  order  an  relative  to  the  (unknown)  prior  distribution  G 
if  r(G,dn)  —  r(<7)  <  0(an)  as  n  — »  oo,  where  {an}  is  a  sequence  of  positive  numbers  such 
that  lim  a„  =  0. 

n— *oo 

Now,  straightforward  computation  leads  to  that  <p(x)  is  increasing  in  x.  Thus,  we  let 
A(0o)  =  {x\(p{x)  >  0O}  and  B{6 0)  =  {x\<p(x)  <  ^o}-  Define 


(2.6) 


(2.7) 


f  min  A{0o)  if  A{6o)  #  <t> , 
1  oo  if  A{60)  =  <t>. 


(  max  B(0 0)  if  B(0 o)  ^  <t> , 

\-l  if  £(*>)  =  *, 


where  <f>  denotes  the  empty  set. 


By  the  increasing  property  of  <p{x)  with  respect  to  the  variable  x,  m  <  M;  also, 
m  <  M  if  A(0q)  #  4>-  Furthermore, 


(2.8) 


x  <  m  iff  <p(x)  <  0q  and  y  >  M  iff  <p(y)  >  0o. 
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The  following  theorem  is  our  main  result. 

Theorem  2.1.  Let  {<f* }  be  the  sequence  of  empirical  Bayes  decision  rules  defined  above. 
Suppose  that  60  <  Q.  Also,  assume  that 

(a)  0dG(0)  <  oo  and 

(b)  m  <  oo. 

Then,  r(G,  d* )  —  r(G)  <  0(exp{— cn))  for  some  positive  constant  c. 


Proof:  Under  Assumption  (b)  and  by  (2.8),  direct  computation  leads  to 
(2.9) 

r(G>dn)~r(G)  =  j^,[0o~<P(x)}P{<Pn(x)  >  *0 }/(*)  +  YL  -0o]P{<Pn{x)  <  *0 }/(*)« 

i=0  z—M 

m 

where  ^  =  0  if  m  =  —1. 

x=0 


The  non  decreasing  property  of  <p*n{x)  implies 

'  P{<Pn(x)  >  *>}  <  P{<P*n{™)  >  M 

< 

.  P{<  W  <  *»}  <  p{k(m)  <  to) 

Combining  (2.9)  and  (2.10),  we  have 


(2.10) 


for  all  x  <  m, 
for  all  x  >  M . 


(2.11) 


r(<?,0  -  r(G)  <  b\P{<p*n(m)  >  60)  +  b2P{<p*n(M )  <  0O }, 


where  0  <  bi  =  jP  [Oo  -  y>(z)]/(x)  <  oo,0  <  b2  =  £  [^(x)  “  ^o]/(x)  <  °°>  and  the 

Z=0  X  =  M 

finiteness  of  both  6i  and  62  is  guaranteed  since  /  0dG(0)  <  00  by  Assumption  (a). 

Therefore,  it  suffices  to  consider  the  asymptotic  behaviors  of  both  P{^n(m)  ^  ^0} 
and  P{<p*n{M)  <  0O}- 
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By  the  definition  of  when  (M)  <  Q,  then  (Af)  >  y?n(M),  where  <pn{-)  is 

the  function  defined  in  (2.2).  In  view  of  this  fact  and  by  (2.1)  and  (2.2), 

P{<fin(M)  <  60}  <  P{Vn(M)  <  to} 

(2.12)  =  <  -t(MJa)  +  A(M,«0.n)>, 

"ft 

where 

(2.13)  Aj(x)  =  h(x)[I{z+l)(Xj)  -  /(x+1)]  -  t0h(x  +  l)[/{x}(Xy)  -  /(*)], 

(2.14)  t(x, 0O)  =  h(x)f(x  +  1)  -  t0h(x  +  l)/(x), 

(2.15)  A(x,tf0,n)  =  *«[/t(x  +  1)0O  -  h{x)}. 

Also,  by  the  definition  of  <p*n[x)  and  (2.1)  and  (2.2)  again, 

P{p;(m)  >  *0} 

-  P{<fin (y)  >  do  for  some  y  =  0, 1, . . . ,  m} 

m 

<  5Z  ^  M 

V=0 

m  -  n 

(2.16)  =  £/>{-£  A, (y)  >  -<(y,*o)  +  A(y,«0,n)}. 

v=o  y=i 

Note  that  Ay(x),,7  =  l...n,  are  iid;  E[Aj(x)\  =  0,  and  ai(x,0o)  <  Ay(x)  <  02(1, tfo) 
where  ai(x,0o)  =  -h(x)/(x  +  1)  -  h(x  +  1)0O  +  h{x  +  l)$of(x)  and  03(1,  Bo)  =  h(x)  - 
h{x)f{x+ 1)  +  h(x  +  l)0o/(*)-  Also,  since  6n  =  o(l)  and  m  <  00,  there  exists  some  positive 
integer  n0  such  that  for  all  n  >  n0,  |A(y,0o,n)|  <  j|t(y,0o)i  hold  for  all  0  <  y  <  m 
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and  for  y  =  M.  Hence,  for  n  being  sufficiently  large,  -t(M,60)  +  A(M,  6o,n)  <  0  since 
t(M, 60)  >  0;  and  —t(yi6o)+A(y,0o,n)  >  0  for  0  <  y  <  m  since  t(y,0 o)  <  0  for  0  <  y  <  m. 
In  view  of  the  above  facts  and  by  Theorem  2  of  Hoeffding  (1963), 

P{^  £  A’W)  <  o)  +  A(M, «„,»)} 

Tl 

;=i 

<  exp{-2n[-t(M,0o)  +  A  (M,  0O ,  n)  ]  2o3  1  (M,  60) } 

(2.17)  <  exV{-~{-t(M,e0)}2a^(M,e0)} 

and  for  0  <  y  <  m, 

1  n 

p{-^2Ai(y)  >  -t{y,0o)  +  A(y,0o,n)} 

71 

3  =  1 

<  exp{— 2n[— t(y,tf0)  +  A(y,  0O,  n)]2a3 ^y.flo)} 

(2.18)  <  exp{-^(-t(y,0o)l2a3- '(y^o)}, 

where  a3(x,0o)  =  02(2^0)  -  ai(x,0o)  =  h(x)  +  h{x  +  1)0O- 
Let 

(2.19)  c  =  ^  min  {t2(y,0o)a3  x(y,0o)|O  <  y  <  m  or  y  =  M). 

It  is  clear  that  c  >  0  since  m  <  00  from  Assumption  (b)  and  f2(y,  ^o)oj1(y,^o)  >  0,  for 
all  0  <  y  <  m  and  for  y  =  M.  Then  from  (2.11),  (2.12)  (2.16)  to  (2.19),  we  have 

m 

(2.20)  r(G,d*)  -  r(G)  <  6j  ^exp(-cn)  +  62exp(-cn)  =  C>(exp(— cn)). 

v=o 

Hence,  the  proof  of  this  theorem  is  completed. 
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3.  Examples  and  Remark 


The  following  two  examples  have  been  considered  by  Johns  and  Van  Ryzin  (1971)  and 
used  to  illustrate  the  performance  of  their  proposed  empirical  Bayes  decision  rules  for  the 
two-action  problem.  We  cite  them  and  use  the  same  to  illustrate  the  performance  of  the 
proposed  empirical  Bayes  decision  rules  {d* }. 

Example  1.  (The  Geometric  Distribution).  Suppose  that 

f{x\6)  =  0x(l  -  0),x  =  0,1,2,...;0  <  0  <  1; 
and  that  the  prior  distribution  has  the  probability  density  function  g(0)  where 

g{0)  =  (a  +  1)(1  -  0)a,O  <  0  <  l,a  >  -1. 

ThC„,  *(*)  =  1  and  /(«)  =  Thus,  P(x)  =  ££$£}  =  rffij  which 

tends  to  1  as  x  —*  oo.  Taking  0  <  0q  <  1,  then,  A(0o)  =  {xjy>(x)  >  0o}  ^  <t> ■  Therefore, 
m  <  M  =  min  A(0o)  <  oo.  Hence,  by  Theorem  2.1,  r(G,d*)  —  r(G)  <  0(exp(— cn))  for 
some  positive  constant  c. 

Example  2.  (The  Poisson  Distribution).  Let 

f{x\0)  =  e~96xlT{x  +  1),  x  =  0, 1, 2, . . . ;  0  >  0. 

Letting  the  prior  density  function  be  g(0)  =  e~e,0  >  0,  we  then  have  f(x)  = 

=  ($)*+',  and  k(x)  =  Thus,  *>(*)  =  =  T1  which  ‘«"ds 

to  infinity  as  z  tends  to  infinity.  Therefore,  for  any  finite  0o  >  0,m  <  oo.  Then  by 
Theorem  2.1,  r(G,d‘)  -  r(G)  <  0(exp(— cn))  for  some  positive  constant  c. 


Johns  and  Van  Ryzin  (1971)  considered  several  situations  about  the  behavior  of  the 
tail  probability  of  the  prior  probability  density  function,  under  which,  their  proposed 
empirical  Bayes  decision  rules  may  achieve  the  best  possible  convergence  rate  an  =  n_1. 
We  also  apply  those  conditions  to  the  sequence  of  the  empirical  Bayes  decision  rules  {d* }. 
We  state  the  result  as  a  corollary  without  citing  the  statement  of  those  conditions.  The 
reader  is  referred  to  Johns  and  Van  Ryzin  (1971)  for  detail. 

Corollary  3.1.  Let  {d* }  be  the  sequence  of  the  empirical  Bayes  decision  rules  defined 
in  Section  2.  Suppose  that  $dG(6)  <  oo.  Then,  either  under  the  assumptions  in 
Theorem  3  or  under  the  assumptions  in  Theorem  4  of  Johns  and  Van  Ryzin  (1971),  we 
have  r(G,  d* )  —  r(G)  <  0(exp(— en))  for  some  positive  constant  c. 

Proof:  We  need  only  to  verify  that  A(0o)  #  4>  under  each  assumption.  This  can  be  done 
directly  by  noting  the  Lemmas  4,  5  and  6  of  Johns  and  Van  Ryzin  (1971). 
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